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Abstract. We study the feasibility and noise sensitivity of portfolio 
optimization under some downside risk measures (Value-at-Risk, Expected 
Shortfall, and semivariance) when they are estimated by fitting a parametric 
distribution on a finite sample of asset returns. We find that the existence of the 
optimum is a probabilistic issue, depending on the particular random sample, in 
all three cases. At a critical combination of the parameters of these problems we 
find an algorithmic phase transition, separating the phase where the optimization 
is feasible from the one where it is not. This transition is similar to the one 
discovered earlier for Expected Shortfall based on historical time series. We 
employ the replica method to compute the phase diagram, as well as to obtain 
the critical exponent of the estimation error that diverges at the critical point. 
The analytical results are corroborated by Monte Carlo simulations. 
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1. Introduction 

Portfolio optimization is one of the fundamental problems of financial theory. The first 
treatment of the topic appeared in the famous work by Markowitz p] , who measured 
risk by the standard deviation of asset price fluctuations. In this context, portfolio 
optimization consists in minimizing the variance of the portfolio return given the 
expected return and the budget constraint. Although this defines a straightforward 
mathematical problem, the statistical properties of the solution turn out to be non- 
trivial when the covariances of the asset returns are estimated from a finite sample. 

An extensive investigation of the noise sensitivity of the Markowitz portfolio 
optimization problem [SI El H] revealed that for normally distributed asset returns 
the expected value of the ratio go of the risk of the estimated optimum and that of 
the true optimum is proportional to (1 — iV/T) -1 / 2 , where N is the number of assets 
in the portfolio and T is the sample size (number of observation periods). In other 
words, the estimation error diverges as T — > N, and, in order to reduce the estimation 
error to a reasonable level, one needs a fairly large sample. Moreover, the estimated 
optimal portfolio weights exhibit dramatic fluctuations from one sample to another, 
and these fluctuations decay very slowly with increasing sample size. Covariance 
matrix filtering techniques based on Bayesian Shrinkage [S El E] and Random Matrix 
Theory [8] O [10l [HJ [12] were shown to effectively reduce go E], however, these 
techniques do not generally suppress the large fluctuations of the estimated portfolio 
weights. 

In addition to the noise sensitivity of the classical standard deviation, Kondor et 
al [13j also examined the sensitivity of portfolio optimization under a few alternative 
risk measures, such as Mean Absolute Deviation [14], Maximal Loss [15] and Expected 
Shortfall p~6|. \T7\ . All of these were found to be even more susceptible to sample 
fluctuations than standard deviation, and in addition, Expected Shortfall (and 
Maximal Loss as its special case) displayed an additional instability in that the very 
existence of the optimum turned out to depend on the sample and the probability of 
the existence of an optimum was found to be less than one for any finite sample size. 
In other words, even if Expected Shortfall has a well defined minimum for a given 
asset return distribution, it may not have an optimum on a finite sample generated 
by that distribution. 

Expected Shortfall is perhaps the simplest and intuitively most appealing 
example of the celebrated Coherent Risk Measures [HI HI] , which were introduced in 
response to the widespread use of ad-hoc risk measures (including Value-at-Risk) with 
poor theoretical foundation and well-known shortcomings. However, the instability 
discovered on the example of Expected Shortfall raised the suspicion that Coherent 
Risk Measures, all their axiomatic beauty notwithstanding, may be highly susceptible 
to sampling error in general. Indeed, this conjecture has been proved to be true by 
showing that no Coherent Measure of Risk has a minimum, if there exists a portfolio 
that produces positive returns for all observations on the given sample [20 . 

The studies mentioned above were based on non-parametric estimators of the 
risk measures in consideration, without any a priori assumption about the sample 
generating process. However, estimators based on historical time series are notoriously 
unstable, so it is legitimate to ask whether parametric estimation could suppress the 
instability. Moreover, Value-at-Risk is often measured in practice by parametric 
estimation using some assumption about the probability distribution of the asset 
returns [21]. Since in practice VaR is the most important measure in use today, 
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and, furthermore, its parametric estimation is analogous to that of ES, it is a natural 
idea to study the stability of portfolio optimization under VaR and ES by fitting a 
multivariate Gaussian distribution on the sample of asset returns. The main objective 
of this paper is to decide whether the instability of historical ES (and VaR) estimation 
can be circumvented by parametric estimation. For the sake of simplicity we are also 
going to assume that the data generating process is itself Gaussian. It will turn out 
that, although parametric fitting reduces the instability, it does not eliminate it. 

It should be noted that this paper, as well as the earlier studies mentioned above, 
investigate the noise sensitivity of the global risk minimization, without imposing any 
constraint on the expected return. This is a special case of the practically more relevant 
risk-reward optimization problem. It is clear, however, that adding a linear constraint 
to the global minimum risk problem does not change essentially the noise sensitivity 
characteristics. Focusing on the simpler problem makes it easier to understand and 
identify the effects and consequences of sampling error, while at the same time leaves 
open the possibility of revisiting the more general problem later. 

The rest of the paper is organized as follows. Section [2] is a brief overview 
of earlier results on the instability of the minimization of Expected Shortfall with 
non-parametric estimation. In Section [3] we solve the ES /VaR minimization problem 
assuming that asset returns follow a multivariate normal distribution with explicitly 
known means, variances and covariances, and we derive the condition for the solution 
to exist. In Section [4] we investigate the feasibility and noise sensitivity of ES/VaR 
minimization when the parameters of the asset return distribution are estimated from 
finite samples. This section, which constitutes the backbone of our paper, is divided 
into several subsections: in 14.11 we introduce some notations and terminology, in 14.21 
we use the replica method to characterize the critical behavior of the finite sample 
instability of the optimization problem, in 14.31 we generalize these results to the case 
of correlated asset returns, in 14.41 we back up our findings with simulation, and finally 
in 14.51 we apply our results to the special case of semivariance minimization. The 
paper ends on a brief summary. 

2. The noise-sensitivity of Expected Shortfall minimization with 
non-parametric estimation 

To put our discussion in context, we provide a brief overview of the results for 
the minimization of Expected Shortfall using a non-parametric estimator. Expected 
Shortfall is the mean value of losses exceeding a high threshold (referred to as the 
confidence level) specified in probability rather than in money. For instance, at 
confidence level a the Expected Shortfall (ES Q ) of an investment is the average of 
losses that occur in the (1 — a)100 percent of the worst cases. 

Historical ES based on a finite sample consisting of T observations can be 
estimated by sorting these observations into ascending order and computing the 
average of the T(l — a) smallest values. Special care must be taken, however, when 
T(l — a) is not an integer number: in such a case one of the observations has to 
be 'split'. (For the precise definition of ES Q see for instance [17] .) It was shown in 
[22 that within this scheme portfolio optimization is equivalent to a convex linear 
programming problem. This is to be contrasted with the case of VaR, which, as a 
quantile, has no reason to be convex, and, indeed, is often found to be non-convex 
when estimated from historical time series. (This is why the problem of the noise 
sensitivity of VaR was ignored in [15] : in a sense historical VaR is always unstable.) 
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Figure 1. The boundary between the feasible and unfeasible phases of the 
Expected Shortfall minimization problem on the N/T vs a plane, in the N — > oo 
and N/T = const limit. 



The highly desirable property of convexity has made ES very popular with academics, 
though ES is still very far from replacing VaR in practice or regulation. 

As mentioned in the introduction, the noise sensitivity of ES optimization was 
examined in 13J . That study used a simulation based approach and assumed, 
for simplicity, iid normal asset returns. The (linear programming based) portfolio 
optimization algorithm was performed on a large number of such samples and the 
existence and distribution of the solution was investigated. The main findings of this 
study are the following: 

• ES as a risk measure is much more sensitive to sample to sample fluctuations 
than the variance. 

• On some samples ES does not even have a minimum but diverges to minus infinity. 

• The probability of the existence of the optimum depends on the confidence level 
a, as well as on the ratio between the number of assets N and the number of 
observations T. 

• In the limit where N — > oo and N/T is held constant the probability of the 
existence of the optimum tends either to 1 or to 0. On the N/T vs a plane the zero 
probability (unfeasible) and unit probability (feasible) regions are separated by a 
well defined curve (the phase diagram) , which was first determined by simulations 
[13] , then computed analytically by the replica method [23] (see Figure [1]) . 

In practical applications the confidence level is typically a > 0.9, and as shown by 
Figure [T] in that region the critical N/T ratio is very close to 1/2. This means that 
in the practically relevant cases one must have at least twice as many observations 
as the number of assets in order to ensure even the mere existence of an optimal 
portfolio. (And, of course, a much larger sample is needed to make the estimation 
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error reasonably low.) Moreover, the critical value of the ratio N/T decreases for 
decreasing confidence level, which implies that ES optimization becomes more and 
more unstable, requiring larger and larger samples to give a meaningful result. 



3. The minimization of parametric ES and VaR for Gaussian asset returns 

The sensitivity of Expected Shortfall to sample fluctuations casts a shadow of doubt 
on its practical applicability in portfolio selection. However, one may wonder whether 
this instability is not due to the use of raw data in historical estimation and whether a 
parametric method might be more robust against sample to sample fluctuations^! In 
order to decide the question, we are going to look into the noise sensitivity of portfolio 
selection in the simplest setting, that is when the underlying process is iid normal and 
when the risk is estimated by fitting a normal distribution to the sample. This is a 
standard procedure for VaR estimation [21], but the ES and VaR estimators are so 
closely related that we can examine them together. 

When the return of an asset X is normally distributed with mean fj, and standard 
deviation a, then both its VaR and ES can be written in the form 

U{X) = <t>(a)a - n. (1) 

The particular form of the function </>(a) depends on whether we are computing VaR 
or ES: 

$ _1 (a) for VaR, 

-- / $- x (p)dp = -= for ES. 

l-aj (1 - a)V2vr 

where $ _1 (a;) denotes the inverse of the standard normal cumulative distribution 
function (or error function): 

${x) = -L= f e-£dy. (3) 

y/ZlT J-oo 

We assume that <p(a) is nonnegative and invertible in its domain^, and we will often 
omit its dependence on a in the notation. All the relevant quantities depend on a 
only through 4>{a). 

Let us now assume that we have iV assets in the portfolio and their returns x% 
follow a multivariate normal distribution with means fM and variances/covariances 
<xy (where i,j — 1,2,..., N). A portfolio is simply a vector with components Wi 
representing the amount invested in asset i. Then the expected value and the variance 
of the portfolio return will be X^ili w it JL i an d Ylij=i < J ij w i w j7 respectively. According 
to (JTJ) , ES and VaR can then be written as: 



\ 



N N N 
= 1 j=l i=l 

The optimal portfolio can be found by minimizing lZ({wi}) subject to the budget 
constraint 

JV 

$>i = l. (5) 



X We are obliged to M. Gordy for a stimulating discussion on this point. 

§ This means that for VaR we only allow confidence levels between 0.5 and 1. This is, however, not 
a real restriction, since VaR does not make sense as a risk measure for a < 0.5. 
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It is easy to see that this optimization problem is equivalent to minimizing the following 
Lagrangean: 



£({wi},z, A, 77) = (j>y/~Z- 



] WiHi + A Wj - 1 



E 



(6) 



where A is used to enforce the budget constraint while z and rj have been introduced 
to make the objective function quadratic in the portfolio weights. The minimization 
of C is a routine task, and it turns out that the optimum exists if and only if the 
covariance matrix Cy is non-singular and 



B 2 — AC + A<j) 2 > 0, 
where we introduced the notations A = ^ 

12ij ^ ia 7j l f-j- As long as these conditions are satisfied, the solution is given by 



(7) 



B 



E 



(Ji^jij and C 



w. 



A* = — — 



A*), 



C- 



1 1/2 



A 



C 



A 



1/2 



(8) 



(9) 



(10) 



Condition makes it clear that the existence of an optimal portfolio is 
not automatically guaranteed, but depends on the parameters of the underlying 
distribution (specifically on the expected values and covariances of the asset returns) . 
When these parameters are estimated from a random sample, the fulfillment or 
violation of (|7|) (i.e. the feasibility of the optimization problem) will also be a random 
event. 



4. The stability of parametric ES and VaR optimization on finite samples 

4-1- The characterization of noise sensitivity 

Let us now assume the position of an investor who knows that the returns are Gaussian, 
but does not know the parameters (i.e. the means, variances and covariances) of the 
distribution, so she has to estimate them from a finite sample. Let us assume she 
makes T independent observations, each consisting of a vector of N realized asset 
returns. This sample can be represented by an TV x T matrix with elements xu equal 
to the realized return of asset i over time period t (i = 1, 2, ...,N and t = 1, 2, T). 
The means Hi and covariances <jy can be estimated by the unbiased estimators 

1 T 

t=i 

1 T 

°ij = ^7— j- E (- Tit ~ ^) Ojt - Ai) ■ ( 12 ) 
t=i 

Then the risk of portfolio {wi} can be estimated by substituting /tj and <7y into ([4]). 
Let us denote this estimated risk by 72.0 ({«;,}). Now we can ask two fundamental 
questions: 
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(i) Does TZff, ({u>j}) have a minimum? 

(ii) If it does, how far is this minimum from the real optimum? 

Question (JTJ) can be answered by checking whether condition (0 is fulfilled by fii and 
Moreover, the matrix aij is positive semidefinite by construction, therefore the 
estimated optimum is unique, provided that it exits. As for Question [[H]), first we 
need to specify how to measure the distance from the real optimum. To this end, we 
use the generalization of the measure go introduced in [3J, which in the present case 
is defined as follows. 

If we know the parameters /i^ and of the data generating distribution - for 
instance, in a simulation study like the ones in [3J and [13j - we explicitly know 
the true risk function 72-0 ({lUi}). Let us assume that the data generating process is 
such that 72.0 has a minimum under the budget constraint, and let us denote the 
corresponding optimal weights by w* . Our hypotetical investor, however, only knows 
the estimators \±i and <7y , so she will minimize the estimated risk function TZ^ ({tfj})- 
Assuming that it exists, let this estimated optimum be w* . Although the investor 
might have the impression that this portfolio has risk TZ^({w*y) we know that its real 
risk is IZ^^w*}) which, by definition, is greater than the risk in the true optimum 
1Z<f,({w*}). Therefore, the quantity 

Qo — 75 77 7TT U'JJ 

^({<}) 

is a natural dimcnsionless measure of the distance of the estimated optimum from the 
true optimum. Moreover, the number go — 1 has a straightforward interpretation: it is 
the percentage increase in the optimal risk the investor has to face due to the sampling 
error. 

The properties of go have been extensively studied both numerically and 
analytically for the case of global variance optimization [3j HI [24]. Let us briefly 
summarize the main findings of these investigations: 

• qo is a random variable which fluctuates from sample to sample, and its 
distribution depends on N and T. 

• For large N and T and their ratio kept constant, Eg 2 = (1 — N/T)^ 1 (E denotes 
the average over sample fluctuations). 

• In the same limit (N/T is held constant and N — > oo) the variance of q\ vanishes. 

In other words, the estimation error qo is a self-averaging quantity, and for large N 
and T its average only depends on the ratio r = N/T. The divergence of qo in the 
limit r — > 1 can be regarded as the manifestation of an algorithmic phase transition, 
with a critical point r c — 1 and a critical exponent —1/2 for the estimation error 
go ~ (r c ~ r)- 1 / 2 . 

Further studies of the noise sensitivity of portfolio optimization led to the 
conclusion that the critical behavior of the estimation error is similar to the above 
for a number of other risk measures (e.g. mean absolute deviation, maximal loss, 
non-parametric Expected Shortfall [13]) and data generating processes (e.g. GARCH 
[2"5]). As we shall see in the following section, parametric ES and VaR also belong to 
the same universality class. 
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4-2. The replica approach 

Averaging over samples is the same as what is called quenched averaging (see e.g. [25] ) 
in the statistical physics of disordered systems. Therefore, the heuristic replica method 
that has been so successful in that field can also be used effectively to investigate the 
noise sensitivity of portfolio optimization (23] [27] . In this section, we are going to 
employ the replica approach 1) to determine under what circumstances the optimum 
exist, and 2) to compute q$ provided that there is an optimum. The computations 
will be performed in the 'thermodynamic' limit, that is when N — > oo while r = N/T 
is finite and fixed. 

For the sake of simplicity we are going to assume that the data generating 
distribution is iid standard normal, in other words, the elements Xn of the sample 
matrix are identically distributed and mutually independent standard normal random 
variables. (We use these assumptions to make our argument more straightforward, 
but as we shall see in the next subsection, introducing correlations into the model does 
not affect our main results.) Since in this case /Zj = and <7y = Sij, the true risk of a 
portfolio {wi} will be 



\ 



JV 



(14) 



For later convenience, we are going to use a modified form of the budget constraint: 

N 

£>i = iV, (15) 

which obviously does not change the nature of the optimization problem (it only 
rescales the result by a factor of N). Thus, the minimum of (TTJJ subject to (fTS"]) will 
be the portfolio with weights w\ = u>2 = ... = w* N = 1, and the minimal risk will be 
TZ% — <f>y/~N. Hence, for a standard normal data generating distribution the distance 
of a portfolio {wi} from the true optimum is given by 

i=l 

(It is worth noting that in the special case of iid standard normal returns, we get 
exactly the same formula, if we measure the risk by standard deviation.) 

It is clear that VaR/ES optimization based on a sample {xu} can be regarded as 
a statistical physics problem. Combining equations ^ , (TITl) and (TT2"]) the Hamiltonian 
of the problem can be written as 

N T 

H ({wi}, z, r\\ {x it }) = N(j)^/z - 




Tz 



(17) 



where we replaced the factor 1/(T — 1) by l/T in equation (|T2|) . which makes no 
difference in the thermodynamic limit. (The budget constraint is not explicitly 
included in the Hamiltonian, but it will be taken into account soon.) We are interested 
in finding the ground state of this system. It is expedient, however, first to introduce 
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a fictitious inverse temperature (3 and work out the partition function Z for finite 
temperature. The partition function is a functional of <fi(a) and the sample x it : 

/oo N /*oo poo / N \ 

|f dwi / / dr,8 VV - N) e -0n{m},z,m{x it} ) = 
-oc i=1 Jo Jo \i=l / 

/oo /-oo 
IJd«, 4 / dXe iX (^=^- N h^^^ w ^ x (18) 
-OO „-_i J —oo 



dz / c??7e 



jvw^+^[ELi(Ef = i(^-TSr=i ^>.) 2 - Tz ] 



Then the risk at the optimum, estimated from sample {x it }, is computed as: 

K = - lim 7^7 log Zp [0; {x Jt }] . (19) 

p— >oo piv 

This is nothing but the free energy density at zero temperature (i.e. the ground state 
energy density). 

The free energy and all the " thermal averages" one can derive from it depend on 
the random sample. In general, one is interested in computing averages over the sample 
fluctuations (e.g. E<Zo)> so we nave to average the free energy over the random samples. 
To obtain E7^ we have to compute ElogiTg [(/>; {x it }]. Averaging the logarithm of a 
random variable is a hard task. The replica method (see e.g. [55]) was invented to 
circumvent this difficulty by the use of the identity 

7 n — 1 

log Z= lim , (20) 

n— >-0 n 

and computing KZ n for positive integer n, which is a relatively simple task. In order 
to be able to take the n — > limit, ultimately one has to analytically continue to real 
n. The name of the method derives from the fact that Z n is the partition function 
of a system that consists of n identical copies (replicas) of the original problem. The 
Achilles heel of the method is the analytic continuation whose uniqueness usually 
cannot be guaranteed; we will justify its use ex post by the simulation results to be 
presented in the next section. 

The sample elements xu are independent and identically distributed random 
variables, so assuming a variance of 1/iV their joint probability distribution function 
is 

/ N \ NT / 2 ( N N T \ 
KM)=(^j exp(-f EE4J ■ (21) 

We can compute EZ n , by expressing Z n as the product of n independent, identical 
integrals over the replicated variables wf, z a and rj a (a = 1, 2, ri), and then taking 
its average with respect to the density function (f2Tj) . After computing several Gaussian 
integrals we arrive at the expression 

/oo pioo poo poo 

dQ ab / dQ ab / dz / d V e NG ^ ab ^^-^^ a » (22) 
-oo J-ioo Jo Jo 

where we omitted the normalizing factor and used the notations 
Gp({Q ab },{Q ab },{z a }Av a }) = 

= J2 Q ab (Q ab - !) - \ Tr log Q - Yr Tr lo S Q - (23) 

a,b=l 
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-is e + - E ^ + n log MiQ ab h m), 

a—l a—1 

and 
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Af3({Q ab },{v a }) 



duf exp 



1 n T n T 

Je (q-tt<^+^ee< 



.,6=1 



a=l t=l 



*--M> ; EE • ; yE'/" E"- 

a=l t=l a=l \t=l / 

Here we introduced the so called overlap matrix 



Q 



ab 



^E<^ 



(24) 



(25) 



and its conjugate Q ab , which is a Lagrange multiplier to enforce the equality above. As 
we are interested in the N — > oo limit, we can use the saddle point method to compute 
the integral (|22|) ■ Since we are dealing with a convex optimization problem, we expect 
that the saddle point is replica symmetric, that is we assume that Q ab = q + AqS ab , 



Q 



ab 



AqS ab , rf 



j] and z a 



After eliminating q and Aq by partial 



extremization, we get Gp(q, Aq, z,rj) — n[g + Pgpiq, Aq, z,r/)) + 0(n 2 ), where go 
is some constant and 



Af3(q ,Aq,r]) 



1 



1 - r 



2f3Aq 2(3r 

-<j)y/z + -ZT) + 



logA, + £)- 



1 



(3Nn 



\ogA{q,Aq,r)) 



(26) 



dul exp 



2Aq'- 



Tin \ " 7i T 



t=l \a=l 



x exp 



-^EE< 2 +^E^+^E|E< 



a=l t=l 



a=l \t=l 



(27) 



In the thermodynamic limit, the optimum can be obtained by minimizing the free 
energy density, which works out to be 



fp(q,Aq,z,r]) = -- lim -j- lim g p (q, Aq,z,r}). 



(28) 



In this limit log (qo, Aq, rf)/Nn can be computed explicitly by performing the 
Hubbard-Stratonovich transformation twice to linearize quadratic terms in the 
exponent of the integrand, then computing a few more Gaussian integrals and 
approximating the logarithm function by its series expansion around 1. Finally we get 



f ( q ,A,z, V ) = ± + ^ 
^log 



1 

~Yr 



1 1 A , 1 

J3 l0g j + A 

2ttA \ 
1 + 2 V A J 



A log ^ + 4" ) + 4>\fz - — ZT) 

r 



Ar 2 



A + 277A 2 ' ~ J (29) 

where we introduced the variable A = 13 Aq. It is clear that in the zero temperature 
{(3 — ► oo) limit the free energy density is finite only if A remains a non-zero, finite 
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Figure 2. Left panel: The curve of critical N/T values as a function of c/>. Right 
panel: The curves of critical N/T values as a function of a for VaR and ES. For 
comparison purposes, the critical curve of the historical ES optimization problem 
is also plotted with a black dashed line. 



constant. (In other words, the difference between the diagonal and non-diagonal 
elements of the replica matrix is proportional to /3 _1 , therefore, it vanishes in the zero 
temperature limit.) 

Introducing the new variables rj = 2r]A and q' = q/A 2 we obtain the zero 
temperature free energy density in the form: 

1 A 



fc\(q' , A, z, n) = 6\fz ^— -t r — 

The saddle point conditions now read 

df_ = df_ = dl = df_ 

dq' <9A dz drj' 
which implies that the solution is 



1 



1 



if 



l + r + )q' + r z . (30) 



<1 

A* 



[(1 



-1/2 



(l-r) s 



(31) 

(32) 
(33) 
(34) 

(35) 

From (f33|) it is clear that the saddle point method is only meaningful, if (1 — r)<fr 2 — r > 
0. That is, in the thermodynamic limit, for each value of (f> there is a critical value 
r c of r — N/T so that the optimization problem is not feasible unless r < r c . (This 
stability condition corresponds to ([7]) in the thermodynamic limit.) Equation (f3"3"]l 
implies that the critical values r c are on the curve 

rc(0 = (36) 

which divides the r vs </> plane into two distinct phases: one in which the optimization 
is feasible and another one in which it is not. The implied phase diagrams can be 
seen in Figure [21 The left panel shows the phase boundary in the r vs tfi plane. It 
is interesting to take a look at the asymptotic behavior of r c (<f)): as it increases in a 
strictly monotonous manner and lim^oo r c ((j)) — 1, it is clear that r c (4>) < 1 for any 
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finite ip. In other words, for any confidence level a < 1 (whether we are dealing with 
VaR or ES) the minimal length of the time series that ensures the existence of the 
optimum must be greater than N. 

Substituting the formulas in Q into (|3T>)) we get the phase boundaries of VaR 
and ES, respectively, in the r vs a plane (right hand side of Figure [2]). It can be seen 
that parametric VaR optimization is more unstable than the parametric optimization 
of ES, although for practically relevant values of a (that is in the a > .9 range) the 
difference is not very significant. (For instance, for a = 99% the critical value r c is 
about 0.844 and 0.877 for VaR and ES respectively.) An interesting feature of both 
phase diagrams is that close to a — 1 they tend to r = 1 with infinite derivatives. 

The right panel of Figure [2] also shows the phase boundary of historical ES, 
so we can easily compare it to the critical curve of parametric ES. It is clear that 
the non-parametric phase curve is below the parametric one for any confidence level 
a, therefore the parametric estimation is more stable. In other words, a shorter 
time series is enough to ensure the feasibility of portfolio optimization, if parametric 
ES estimation is used. This was to be expected, but it is important to stress that 
although parametric fitting reduces the chance that there is no optimum for a given 
sample (especially for larger values of a), it fails to completely eliminate the feasibility 
problem originally encountered in historical estimation [13] . 

Let us now derive the sample average of the noise sensitivity measure q% in the 
thermodynamic limit, provided the optimum exists. Let us denote this conditional 
sample average by E. From (I16p and the replica symmetric ansatz it follows that 
<7q = q + A//3. Therefore, in the (3 — ► oo limit we find that the conditional average of 
the estimation error of the optimal portfolio is 

% 2 = <T • A* 2 = - ff. = -^?L. (37) 
(1 — r)0 z (a) — r r c (a) — r 

That is, qo ~ (r c ~ r)^ 1 / 2 , so the estimation error of the parametric VaR and ES 
optimization displays the same critical behavior as the minimization of variance, mean 
absolute deviation, maximal loss and non-parametric ES. More generally, it is very 
probable that the parametric ES and VaR belong to the same universality class as 
the aforementioned risk measures, which would imply that q^ is self-averaging (that is 
its variance vanishes in the thermodynamic limit) also here. This is clearly supported 
by numerical simulations and should be possible to confirm by a (very hard) replica 
calculation which is, however, beyond the scope of this paper. 



4-3. Correlated asset returns 

In this section we are going to show that the results presented above will not change, 
if we allow asset returns to be correlated. We still assume that the data generating 
process is iid normal with zero expectations, but now we allow the covariancc 
matrix o~ij to be any strictly positive definite matrix. Therefore has a Cholesky- 
decomposition, in other words, there is a lower tirangular matrix Dij so that 

N 

o-ij =J2 D ik D jk- (38) 
fc=i 
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Let i/it (i = 1, 2, JV, t = 1, 2, T) be a sample of normally distributed asset returns 
with zero mean and covariances . It is easy to see that the variables 

JV 

have a standard normal distribution, moreover, the observed return of a portfolio {i^} 
over the time period t can be written as 

JV JV JV N 

= u. Ai^t = e w i x it, (40) 

z— 1 i—1 j — 1 i—1 

where we introduced the notation 



JV 

£ 



VjDji. (41) 



Hence, the matrix D^j defines a linear transformation under which the scalar products 
between the asset return vectors and the portfolio vectors are invariant. This 
immediately implies that the Hamiltonian (|17p as well as q$ are also invariant under 
this transformation, because they only depend on the observed asset returns and the 
portfolio vector through their scalar products. (It is important to bear in mind that 
this is only true, if the expected values of the asset returns are zero.) The budget 
constraint equation is not invariant, however, and it will take the form 

JV JV 

$>5>r x =iV. (42) 

i=l j = l 

The financial interpretation of this result is straightforward. For each i the vector 
defined by {dj}j = {fy 1 }; can be regarded as a portfolio. Then xu denotes 

the return of {dj }j in the time interval t. So the vector {wj is an equivalent 
representation of the portfolio {vi\, but while the latter is expressed in terms of 
the original, correlated assets, the numbers {wt} specify the weights of the standard 
normal assets {d^}j. Since the vectors {dj }j are not normalized in general (their 
components do not sum to unity), the weights {u>i} are measured in different units 
than {vi}. This is why the components Wi have to be rescaled in the transformed 
budget constraint (|4*2")) . 

As a result, the partition function for the sample {yu} of correlated asset returns 
can be expressed in terms of the standard normal variables Xu- 

/OO p OG 

[] dwi / d\e* x (^= i »< £f =1 ET=i ^* x (43) 

1 dz r df]e N ^+^[^(^(^-* x*.) w 3 - t *] . 



This expression is very similar to (|18p and the replica calculations presented in the 
previous section can be repeated to derive the quenched average of the free energy: 

7 1 — r ( 1 A q\ _1 
fp(q, A, z, 77) = — + — — - log — + — + 0Vz - -zi) + 



2A 2r \/3 j3 A J ^ v r 



1 

2r 



1 , / 2ttA 



(44) 
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which only differs from (|29[) in the first term 7/2 A where 

N N 

7 _1 = lim -=yV(T^ (45) 

i=l 3 = 1 

This means that for different values of N the covariance matrix o~ij can be chosen freely, 
the only restriction is that the limit (|4"S"]) must exists. Moreover, without restricting 
generality, the existence of this limit can always be ensured by fixing a positive number 
7 at our convenience and rescaling the asset returns by a constant for each finite value 
of N so that N 2 / ^ . is equal to 7. Since 7 is arbitrary, it is clear that the phase 

boundary as well as E(<?o) mu st not depend on it. In fact, the minimization of the 
free energy density (|4"4"| yields exactly the same results as in the uncorrelated case, 
namely ((3%J) and (f37|) . 

Finally, it is worth noting that the free energy density of the uncorrelated problem 
can be recovered from (|44l) by letting <xy = N~ 1 8ij, so that 7 = 1. The variance scaling 
factor iV _1 is in deed reflected in the joint probability density function ([2"Tj) used to 
average over the uncorrelated asset returns. 

4- 4- Numerical study 

In view of the heuristic character of the replica computation, we feel it is useful to 
provide numerival evidence to support its results. In order to do this, we generated 
independent samples from a multivariate standard normal distribution (/Zj = and 
o~ij — Sij), and attempted to find the minimum of Tt^,({xit}) in each sample. For the 
sake of simplicity, rather than controlling the value of a, we controlled (j> directly. To 
measure the probability of the existence of a minimum for a given combination of N, 
T and </>, we used the following algorithm: 

(i) Generate an N x T sample matrix {xu}- 

(ii) Estimate the means and the covariances from {xn} using equations ([TT|) and (fT2|) . 

(iii) Use the condition ((7]) to check if the portfolio optimization problem is feasible on 
the sample {xu}. 

(iv) Repeat steps (0) to (|m|) K times, and count how many times the optimum exists. 
Let this number be L. Then the estimated probability of feasibility will be 
p(N,T,4>)=L/K. 

Clearly, the larger K the more accurate the measurement will be. 

The left panel of Figure [3] exhibits simulation results for = 2, which corresponds 
to confidence levels of a = 0.9772 for VaR and a = 0.9420 for ES. The number of 
iterations was K = 2000 and the p vs <j> curve was measured for different values of 
N (64, 128, 256 and 512). Based on the previous section, the critical value of N/T 
is r c = 0.8, that is, in the thermodynamic limit the optimum exists with probability 
1 if N/T < 0.8 and it abruptly drops to at the critical value (this is represented 
by the curve labeled by iV = 00 in the figure). The diagram shows that for finite 
values of N and T the probability of the existence of the optimal portfolio decreases 
from 1 to continuously. At the same time, as N increases (that is, as we approach 
the thermodynamic limit) the fall of the probability from 1 to becomes sharper and 
sharper, as expected. The probability curves belonging to different values intersect 
one another at the same point, therefore, this point must correspond to the critical 
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Figure 3. Left panel: The estimated probability p of the existence of an optimum 
as a function of N/T for <j> = 2 and different values of TV. The curve labeled by 
TV = co corresponds to the thermodynamic limit (as computed by the replica 
method). Right panel: Gaussian curve fitting to the measured probabilities for 
TV = 128 and <j> = 2. 



value r c . As shown by the figure, the intersection is, indeed, very close to r — 0.8, in 
excellent agreement with the analytical results. 

We also observed that the probability curves fit very well to the function 
9fj,,(r(x) = 1 — — fi)/o) where is the cumulative distribution function of 

the standard normal distribution, and fi and a are parameters to be determined (e.g. 
via maximum likelihood estimation). The right hand panel shows simulated data 
points for N — 128 and <f> = 2 along with the fitted curve (where fj, = 0.8028 and 
a = 0.0446). It is clear that g fljCr (x) cannot be the exact model for the p vs N/T 
curve, since for N/T > 1 we have p = 0. This fact, however, gradually loses its 
significance as N increases, and a gets smaller and smaller. As a result, fitting g^ a {x) 
to the numerically computed data points makes it possible to estimate p as a function 
of <f> and N/T with a high accuracy, even if the number of iterations K is low; this 
way simulations can be speeded up by a factor ranging from 10 to 100. 

Our numerical study showed that around the critical value r c ((f>) the probability 
p(N, T, <j>) follows the behavior displayed in Figure [3] for any value of <j>, but the 
steepness of the decline from 1 to varies with <j>. To demonstrate this, we numerically 
computed the contour lines of constant p on the N/T vs (j> plane for p = 0.1, 0.3, 0.5, 
0.7 and 0.9 with N = 128 (the number of iterations was set to K = 100, and we fitted 
9ii,,a(x) to the simulated data points). The results are shown on the left hand side 
of Figure |U Comparing this diagram to the left panel of Figure [2] it is evident that 
the contour lines are arranged around, and have a similar shape to, the theoretical 
phase boundary. As mentioned above, the critical points can be estimated as the 
intersections of the p vs N/T curves for different values of N . The green points on the 
right hand panel were numerically computed by fitting 3^,0- (^) to simulated data with 
N = 64 and N = 128, and then calculating the intersection of the two fitted curves 
(the number of iterations was K = 100). The estimated critical points (in green) and 
the computed phase boundary (in blue) line up very well, which confirms the validity 
of the results obtained through the replica method. 
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4-5. A note on semivariance 

Semivariance is one of the oldest downside risk measures. As we shall see, the results 
obtained in the previous sections can be directly applied to characterize the stability of 
portfolio optimization under semivariance, when it is estimated by parametric fitting. 
The definition of semivariance is 

v 2 {X) =E[max{0,X-EpO}] 2 , (46) 

where X is a random variable representing the return of some security. The measure 
v is simply called semi standard deviation, and this quantity can be used to define the 
following, VaR/ES-like risk measure (which is sometimesww called semivariance too, 
leading to some confusion): 

p(X) = u(X)-E(X), (47) 

When the variable X is normally distributed with mean p, and standard deviation a 
the semi standard deviation is simply v = cr/v^, so the risk measure p can be written 
as 

p(X) = \o--n, (48) 

which is exactly of the same form as ([T]) with cf) = l/\/2 « 0.71. 

This implies immediately that in the case of semivariance minimization the critical 
value of N/T is r c = 1/3, that is, for large N (i.e. close to the thermodynamic limit), 
we need a time series that is at least three times as long as the number of assets 
in the portfolio, in order to have a meaningful optimization problem. Moreover, the 
conditional average of q 2 will be Eq$ = (1/3 - N/T)- 1 / '3. 

5. Summary 

We studied the feasibility and noise sensitivity of portfolio optimization under Value- 
at-Risk, Expected Shortfall and semivariance in the case when these risk measures 
are estimated from finite samples using parametric fitting. Similarly to earlier studies 
based on non-parametric estimation [T21 [S3] we first assumed independent standard 
normal asset returns, but in our present work we generalized our results for correlated 
returns as well. We found that the probability that the optimum exists on a given 
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finite sample is smaller than unity, and this probability is a function of the portfolio 
size, the sample size and the confidence level of VaR/ES. In the thermodynamic limit 
(where the portfolio size N goes to infinity but its ratio to the sample size T is held 
constant), this probability converges to either or 1 depending on N/T and the 
confidence level a. We employed the replica method to compute the equation of the 
curve separating the feasible and unfeasible regions on the N/T vs a plane, and also 
tested and supported the result by numerical simulation. The replica approach also 
enabled us to compute the average of the measure of noise sensitivity q$ , contingent on 
the feasibility of the optimization problem. It is highly probable that the parametric 
ES, VaR and semivariance optimization problems belong to the same universality class 
as the optimization of many other risk measures (standard deviation, mean absolute 
deviation, maximal loss, non-parametric ES): we found that the estimation error blows 
up with a critical exponent —1/2 as we approach the phase boundary. 

Our results make it possible to compare the parametric and historical estimator 
of ES. It is clear that parametric estimation does not eliminate the instability of 
the historical estimator, but it does improve on it, in that the phase diagram of 
parametric ES runs above the historical curve. This means that for a given confidence 
level and a given portfolio size we need more data (longer time series) in the historical 
estimation than in the parametric one, in order to have a meaningful solution to the 
optimization problem. It seems as if we had some additional source of information 
in the parametric case. (The effect is even more pronounced in the case of VaR, 
where the historical estimate cannot be guaranteed to be convex for any confidence 
level and any length of the time series, whereas the parametric estimate has been 
shown here to have an optimum at least in a certain region of parameter space.) One 
may wonder where this additional information may have come from. The answer is 
simple: in the historical estimation we do not make any assumption about the nature 
of the underlying distribution, we are just using raw data as they are produced by 
the data generating process. In contrast, in the parametric case we assume that the 
process is Gaussian and fit the data to this assumption. This way we are projecting 
a nontrivial piece of information into the estimation. For technical reasons we have 
indeed chosen a Gaussian underlying process in the context of this work, but in a 
real market return fluctuations arc neither Gaussian, nor even stationary. To project 
an arbitrary distribution into real, parsimonious data may produce apparently more 
stable estimates, but the gain may well turn out to be completely illusory and the 
results misleading. 

We would also like to draw attention to the fact that the critical value of the 
N/T ratio depends on the risk measure and on the (historical or parametric) method 
of estimation. This critical ratio is never larger than 1, and, depending on the risk 
measure and on the confidence level, it may be significantly smaller; e.g., as we have 
just seen, for the semivariance e.g. it is as low as 1/3. This means that, depending on 
the risk measure, we need time series longer than two or three times the size of the 
portfolio, in order to have a solution at all, and much longer, in order to have a reliable 
estimate. In the context of portfolio selection, where, by the very nature of the task, 
the sampling frequency cannot be higher than once a week or even once a month, 
this condition is not easy to satisfy. Therefore, in practice the typical N/T ratio 
may be fairly close to the phase boundary where the estimation error diverges. The 
knowledge of the phase boundary and the position of our working point (confidence 
level and N/T ratio) relative to it is highly important if we wish to take sample to 
sample fluctuations properly into account. 



The instability of downside risk measures 



18 



This work has presented further evidence for the instability of widely used risk 
measures against sample fluctuations. The instability of parametric VaR, easily the 
most popular risk estimate, is particularly notable. We find it remarkable how powerful 
the concepts and methods imported from the statistical physics of random systems 
prove to be in the analysis of these important phenomena. 
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